Text information hiding and recovery via wavelet digital watermarking method

According to the wavelet digital watermarking method, wavelet text hiding algorithm is presented for hiding some text information in a signal with white noises and the corresponding recovery algorithm is also presented for obtaining text information from a synthesized signal. Firstly, wavelet text hiding algorithm is introduced and an example is given for demonstrating how to hide text information in a signal s with a white noise ε, where s = f(x) + ε and f(x) is a function such as sin x, cos x and so on. A synthesized signal \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{s}$$\end{document}s~ can be obtained by wavelet text hiding algorithm. Then, the corresponding text recovery approach is also introduced and the text information is recovered from the synthesized signal \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{s}$$\end{document}s~ by an example. Some figures of the example are shown that the wavelet text hiding algorithm and its recovery are feasible. Moreover, the roles of wavelet function, noise, embedding mode and embedding position are analyzed in the text information hiding and recovering, and it implicates its security. 1000 groups of English texts with different lengths are chosen for illustrating computational complexity and running time of the algorithms. The social application of this approach is explained by a system architecture figure. Finally, some future directions are discussed for our follow-up study.


Scientific Reports
| (2023) 13:9532 | https://doi.org/10.1038/s41598-023-36759-0 www.nature.com/scientificreports/ constantly. The multi-resolution analysis of wavelet transform is also widely used in information security, such as encrypting an audio file by ignteger wavelet transform and hand geometry 21 , hiding reversible data in encrypted images by IWT and chaotic system 22 , the visible digital watermark by integer wavelet transform 23 and so on.
The safe transmission of text information is of great significance in business activities, communication of text information and so on. In these activities, we hope that the most important information can be delivered by a beautiful melody without being found by others. In this paper, an approach is presented to hide and recover the text information in a signal with a noise by wavelet transform. This paper is organized as follows: in Section "Preliminary", Some preliminaries are illustrated for our discussion, including orthogonal multi-resolution analysis, digital watermarking based on wavelet transform; in Section "An algorithm for hiding a text information", an algorithm for hiding some text information is given by the wavelet method, and an example is shown for demonstrating how to hide text information in a signal. In Section "Recovery algorithm for text information", an approach is presented to recover the text information from the synthetic signal in Section "An algorithm for hiding a text information". Moreover, the roles of wavelet function, noise, computational complexity and running time, embedding mode and position are analyzed in the text information hiding and recovering. Some figures are shown for our discussion.

Preliminary
Multi-resolution analysis 13,17 . If a closed subspace sequence V j in space L 2 (R) satisfies the following properties: There exists a function φ(x) , such that the set {φ(x − k), k ∈ Z} is an orthogonal basis of V 0 . Then, an orthogonal multi-resolution analysis is generated by the closed subspace sequence V j , where V j = clos L 2 (R) < φ j,k = 2 j/2 φ(2 j x − k) : k ∈ Z > , φ j,k = 2 j/2 φ(2 j x − k). Note 1: From property (1) to property (4), they are consistent monotony, asymptotic completeness, scaling regularity. the existence of orthogonal bases, respectively. The information of the signal can be encoded at the resolution level j in each subspace V j . Vector space generated by scaling functions with high resolution level contains that by lower resolution level (More details can be seen in 13,17 ).
For each integer j ∈ Z , there exists an orthogonal complementary space W j of V j in the space V j+1 , that is, W j = clos L 2 (R) < ψ j,k = 2 j/2 ψ(2 j x − k) : k ∈ Z >, then the function φ(x) ∈L 2 (R) is the scaling function, and ψ(x) ∈ L 2 (R) is the wavelet function corresponding to φ(x) . Thus, the scaling function φ(x) and wavelet ψ(x) satisfy the following two-scale equation: where the sequences p k and q k are called the low-pass filter and high-pass filter of φ(x) and ψ(x) , respectively.
The decomposition and reconstruction algorithm play an important role in the application of wavelet analysis. For an signal f (x) ∈V j+1 ⊂ L 2 (R) , decomposition algorithm is given as follows: And reconstruction formula is The coefficients c j,k captures the low-frequency information of the signal f (x) , and the coefficients d j,k captures the high-frequency information of the signal f (x).
Digital watermarking based on wavelet transform. Digital watermark has become a hot spot in the security research of multimedia information, and it is also an important branch in the field of information hiding technology research. Digital watermarking technology is mainly used in ticket anti-counterfeiting, copyright protection, tampering tips and hidden signs. The ticket anti-counterfeiting watermark is a kind of special watermark, which is mainly used for the anti-counterfeiting of printed bills, electronic bills and various certifi- (2) c j+1,n = k∈Z p n−2k c j,k + q n−2k d j,k www.nature.com/scientificreports/ cates. The copyright mark watermark is one of the most studied digital watermarks at present. Digital works are both goods and knowledge works. This duality determines that copyright logo watermarking mainly emphasizes invisibility and robustness, but requires relatively little data. Tamper hint watermarking is a fragile watermark, which aims to identify the integrity and authenticity of the original document signal. The purpose of hidden identification watermarking is to hide the important labels of confidential data and limit the use of confidential data by illegal users. Digital watermarking based on the transform domain is the mainstream of the current digital watermark technology research. However, the wavelet transform is widely used in digital watermarking, such as digital audio watermarking 24 , digital ECG signal watermarking 25 , color image watermarking 26,27 and so on. A digital watermarking algorithm based on a discrete wavelet transform is briefly introduced in this section. For a one-dimension signal s , it can be decomposed to a low-frequency component c 1 and a high-frequency component d 1 by first discrete wavelet transform. Then the low-frequency component can also be decomposed to a low-frequency component c 2 and a high-frequency component d 2 by second discrete wavelet transform. Analogously, a low-frequency component c k and k high-frequency components d 1 , d 2 , . . . , d k can obtained after k-th discrete wavelet transform (see the left of Fig. 1). The low-frequency component c k is an approximation for the original signal. High-frequency components d 1 , d 2 , . . . , d k are details of the different frequency bands. Choose an initial position, and a watermarking signal can be embedded in the low-frequency component c k and high-frequency components d 1 , d 2 , . . . , d k . By inverse discrete wavelet transform, a signal s with a watermarking can be obtained (see the right of Fig. 1). If robustness or encryption should be considered, some approach can be adopted to solve these problems such as different weighting for low-frequency and high-frequency coefficients, embedding an encrypted watermarking and so on. Many Scholars have done much research work.

An algorithm for hiding a text information
According to the above instruction, the ideology of public key mechanism and wavelet digital watermarking method play an important role in ensuring information security. In this section, we introduce the text information hiding algorithm for information transmission security by the wavelet transform and ideology of public key mechanism. Our main idea is to blend white noise with text information, then it is embedded in a signal as a watermarking. The following algorithm is given firstly: Wavelet text hiding algorithm (WTHA). The first step is to establish an data set for an English text and The second step is to encode the transmitted text information to generate an array; The third step is to select a signal containing noise.
and decompose it to k levels by the discrete wavelet transform(DWT). we obtain the low frequency coefficient c k and several high frequency coefficients d 1 , d 2 , . . . , d k ; The fourth step is to take an linear transform on the text information code to make it conform to a certain high frequency coefficient feature, and select the appropriate position to add the transformed coding information to the high frequency coefficient; The fifth step is to reconstruct the new high frequency coefficients and the low frequency coefficient by the wavelet reconstruction formula to generate the signal with text information code.
Note 2: The linear transform in Step 4 may be invertible for a simple way. To improve higher security, some generalized reversible linear transformations can also be considered. Of course, Adding text code segments to different high frequency coefficients is also a feasible way to improve higher security. These approaches will be discussed in our follow-up study. In this paper, text code is added to only one high frequency coefficient. It is relative simpler and faster.
The above step hides the text information in a signal with white noise. This approach is called the wavelet text hiding algorithm. The diagram of this approach is shown in Fig. 2, the text information is embedded in high frequency coefficients d 1 .
The following example is given for illustrating the wavelet text hiding algorithm.

Example 1
Firstly, according to the Algorithm 1, establish an English letter database.
The raw signal s is decomposed onto five level and low frequency coefficient c 5 and high frequency coefficients d 1 , d 2 , d 3 , d 4 , d 5 can be obtained. The results are shown in Fig. 3.
Fourthly, the array in Step two is taken a linear transform. And the transformed array is added to the high frequency coefficient by choosing the appropriate position. The new high frequency coefficients d * 1 is obtained. Finally, the coefficient c 5 and d * 1 , d 2 , d 3 , d 4 , d 5 can be reconstructed to generate a new signal s with the array information. The result is shown in Fig. 3.
The above procedure implements that the text information "Text Hiding and Recovery" is hidden in a noise signal. By observing 'original signal with a noise'and 'The signal with a noise and hided text information' directly in Fig. 4, it is not easy to see the difference between them. That means is not easy to observe the hidden text information in the signal directly. By calculating the error signal, a significant error is found from the node 100 to 140 in the signal (see ' error signal' in Fig. 4). So the text information "Text Hiding and Recovery" is hidden between node 100 and node 140.

Recovery algorithm for text information
In the previous section, the hiding algorithm for text information is introduced by the wavelet transform. In this section, a recovery algorithm is proposed for hidden information.

Wavelet text recovery algorithm (WTRA).
Step one, the signal s with text information is decomposed onto several levels and low frequency coefficient and high frequency coefficients can be obtained by DWT.
Step two, According to the chosen position, the data containing text information is captured from the designated high frequency.
Finally, according to the obtained data and English letter database, text information can be restored by an transform.  www.nature.com/scientificreports/ The text information hidden in the signal can be restored by the above steps. This approach is called a wavelet text information recovery algorithm. The diagram of this approach is shown in Fig. 5.
Note 3: The obtained data should be taken the inverse linear transform in Step4 of WTHA to recover the initial code.
Next, the following example is given for illustrating wavelet text recovery algorithm.

Example 2
In this example, the text "Text Hiding and Recovery" can be recovered from the synthesized signal s obtained in Example 1.
Firstly, the signal s is decomposed onto five levels by DWT and low frequency coefficient c 5 and high frequency coefficients d 1 , d 2 , d 3 , d 4 , d 5 can be obtained. The results are shown in Fig. 6. compared to the result in Fig. 3, high frequency coefficients d 1 of s is different from that of s . So text information could be capture by dealing with the data d 1 .
Secondly, according to the information of position in the wavelet text information hiding algorithm, the code array containing text information is captured by dealing with the high frequency d 1 . The array is shown as follows: ' [20,31,

Characteristics of hiding algorithm and recovery algorithm
In the wavelet text information hiding algorithm and wavelet text information recovery algorithm, the code of text information is hidden by noise ε . That means the code of text information can not be captured from the noise easily by wavelet denoising method, because the code of text information and noise ε are mixed together. So noise ε guarantees the security for the code of the text information. In the wavelet text hiding algorithm and recovery algorithm, the main computation complexity is the complexity of the wavelet transform. If the length N of a signal is equal to k 0 2 J 0 , where k 0 is a positive integer, the result of the DWT can be calculated by O(N) times of multiplication. Thus English text information can be hidden and recovered quickly by the wavelet text hiding algorithm and recovery algorithm respectively. In order to illustrate the running time of the wavelet text hiding algorithm, 1000 English texts with different length are chosen for testing the distribution of running time. All English texts are accurately recovered. A distribution figure of running time is shown in the boxplot of Fig. 7. The   4.5010 × 10 −7 , 4.5364 × 10 −7 and 6.2184 × 10 −7 respectively. So the wavelet text recovery algorithm is also quick and stable. Compared to the hiding algorithm, the recovery algorithm is quicker.
In these algorithms, the wavelet function can be chosen arbitrarily to hide and recover the text information, and it is also very critical. In text information hiding, any one wavelet function can be applied. However, in text information recovery, the wavelet function must be consistent with that in text information hiding. Otherwise, the text information can not be recovered accurately and quickly. For example, wavelet function 'db2' is chosen to hide the text information in Example 1. If wavelet 'db3' is chosen to recover the text information, the results of decomposition are shown in Fig. 9. The signal the signal s with text information is also decomposed to five levels by DWT. In Fig. 9, the low frequency coefficient c 5 and high frequency coefficients d 1 , d 2 , d 3 , d 4 , d 5 are significantly different from that in Fig. 6, especially d 1 . According to the approach in text information recovery, the code is recovered as follows: ' [6,16,6,4,7,4,20,8,13,2,4,5,4,7,1,6,5,2,6,4,10,2,5]. ' The text information corresponding to the code is 'FPFDGDTHMBDEDGAFEBFDJBE' . That means the recovery of text information is failed. This is because the wavelet function in the recovery algorithm is inconsistent with that in the hiding algorithm. www.nature.com/scientificreports/ In order to recover the text information from the signal s , it need to be known in which high-frequency coefficient, the coding is hidden. If this information is not known, more time is needed to recover the code. Moreover, The level number of wavelet decomposition is also needed to be known, because the decomposed level number of signal s is log 2 N , where N is the length of the signal s and [·] is an integral function. The last important information is the position in which the transformed code of text information is added to the high frequency coefficient. The position in the embedded signal can be either continuous or intermittent. If it is continuous, the different embedding has N − l results, where l is the length of the code of text information. If it is intermittent, the different embedding has P l N results, where P l N is a permutation number, P l N = N! l! = N(N − 1)(N − 2) . . . (N − l + 1). An example is given for illustrating the intermittent embedding. The result s , in which the text code is embedded the signal s in Example 1, is shown in Fig. 10. Compared to the Fig. 4, it is easy to see that the text code is mixed in error signal and it is difficult to identify the embedded position. The low frequency coefficient c 5 and high frequency coefficients d 1 , d 2 , d 3 , d 4 , d 5 are shown in Fig. 9 by wavelet decomposition. Compared to the d 1 in Fig. 3 and 6, there are too many differences in Fig. 11 to determine exactly which positions are different in d 1 . Thus, in the case of the intermittent embedding, the code is hard to be recovered from high frequency coefficient d 1 without the embedded position. In a word, these critical points can constitute a private key.
According to the above discussion, wavelet text hiding algorithm and wavelet text recovery algorithm can be recognized as a public key cryptography for text information.
Public key: Signal s , Length l of the text. Plaintext: Text Hiding and Recovery. According to the above discussion, this method is consistent with the public key mechanism and has the following characteristics. Firstly, it has two kinds of keys, the public key is public, and the private key is secret. Secondly, deriving a private key from the public key is not computationally feasible. Thirdly, the information encrypted with the public key must be decrypted by a relative private key. Finally, the information encrypted with a private key must be decrypted by using the corresponding public key. According to these critical points, a personalized private key can be also designed by designers or users. These critical points can ensure the security of the text information. Since there are many variables for invaders to be considered, it is very difficult to decipher the text. Not all keys need to be transmitted, in the algorithm, what need to be transmitted is only the synthetic signal and the length l. This approach is similar to a 'lock' with only one 'key' . Public keys turn off the text message into a 'lock' . All critical points form a useful 'key' , where each critical point is equivalent to a bump on the 'key' . Only if every critical point is correct, the text message can be obtained. Especially, if the wavelet filters is designed by some new algorithm, the text message is almost impossible to be deciphered, even knowing the hidden algorithm.
Thus, a complete set of text encryption transmission approach can be proposed by wavelet text hiding algorithm (WTHA) and wavelet text recovery algorithm (WTRA). A system architecture figure for the proposed approach is shown as the following Fig. 12. This system includes sender (TEXT HIDING) and receiver (TEXT RECOVERY). For a sender, a synthetized signal s can be generated form a signal s and text M by the WTHA and the length l of M can obtained. Through the public network, the synthetized signal s and the length l can be transmitted to a receiver by sender. For the receiver, the WTRA can be run by the private key to recover the text M from the obtained synthetized signal s and the length l.

Conclusion and discussion
Based on the wavelet digital watermarking method, an approach is given for hiding some text information in a signal with a white noise. An example and some figures are shown for illustrating how to hide text information in a signal with white noise ε . the noise ε guarantees the security for the code of the text information. Moreover, a method is proposed to recover the text information from the synthetic signal. In order to recover the text information correctly, critical information must be known privately, including wavelet function, number of decomposition level, embedding position. The usual digital watermark is used to protect digital product copyright, integrity, replication or tracking, such as digit images, video, audio or electronic documents. The idea of the algorithm, which is hiding text information via wavelet digital watermarking method, to protect the watermark by a digital signals with noise. And the watermark is the important text message, which is need to be transmitted to others. In addition, this English text hiding approach and recovery method cannot be done by manual calculation, and can only be done by using a computer, unlike the Morse code. This is both a merit and a problem. How to hide and recover the English text by manual calculation in extreme cases. This will be a very challenging research question in the future. Of course, there are many encoding methods. Alphabetical is one of simplest method. So the code of text information or the embedded location can be encrypted to improve security. Moreover, with the advent and development of GPT, whether this approach can maintain its security. In other words, how to improve the algorithm makes the GPT undecipher in limited time. That means even if the GPT knows that the algorithm, it cannot be deciphered. Moreover, Based on the wavelet space, it is recently discovered that many different types of data might been hidden at once. This is more interesting. The further exploration in theory is still under study. These new problems and challenges will be continued in our follow-up study.

Data availability
All data generated or analyzed during this study are included in this published article and the uploaded file 'data.xls' .